<?xml version="1.0" encoding="ascii"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <title>bkn.bibtex.new_merge_tools</title>
  <link rel="stylesheet" href="epydoc.css" type="text/css" />
  <script type="text/javascript" src="epydoc.js"></script>
</head>

<body bgcolor="white" text="black" link="blue" vlink="#204080"
      alink="#204080">
<!-- ==================== NAVIGATION BAR ==================== -->
<table class="navbar" border="0" width="100%" cellpadding="0"
       bgcolor="#a0c0ff" cellspacing="0">
  <tr valign="middle">
  <!-- Home link -->
      <th>&nbsp;&nbsp;&nbsp;<a
        href="bkn.bibtex-module.html">Home</a>&nbsp;&nbsp;&nbsp;</th>

  <!-- Tree link -->
      <th>&nbsp;&nbsp;&nbsp;<a
        href="module-tree.html">Trees</a>&nbsp;&nbsp;&nbsp;</th>

  <!-- Index link -->
      <th>&nbsp;&nbsp;&nbsp;<a
        href="identifier-index.html">Indices</a>&nbsp;&nbsp;&nbsp;</th>

  <!-- Help link -->
      <th>&nbsp;&nbsp;&nbsp;<a
        href="help.html">Help</a>&nbsp;&nbsp;&nbsp;</th>

  <!-- Project homepage -->
      <th class="navbar" align="right" width="100%">
        <table border="0" cellpadding="0" cellspacing="0">
          <tr><th class="navbar" align="center"
            ><a class="navbar" target="_top" href="http://code.google.com/p/bibkn/">bkn.bibtex</a></th>
          </tr></table></th>
  </tr>
</table>
<table width="100%" cellpadding="0" cellspacing="0">
  <tr valign="top">
    <td width="100%">
      <span class="breadcrumbs">
        Package&nbsp;bkn ::
        <a href="bkn.bibtex-module.html">Package&nbsp;bibtex</a> ::
        Module&nbsp;new_merge_tools
      </span>
    </td>
    <td>
      <table cellpadding="0" cellspacing="0">
        <!-- hide/show private -->
        <tr><td align="right"><span class="options">[<a href="javascript:void(0);" class="privatelink"
    onclick="toggle_private();">hide&nbsp;private</a>]</span></td></tr>
        <tr><td align="right"><span class="options"
            >[<a href="frames.html" target="_top">frames</a
            >]&nbsp;|&nbsp;<a href="bkn.bibtex.new_merge_tools-module.html"
            target="_top">no&nbsp;frames</a>]</span></td></tr>
      </table>
    </td>
  </tr>
</table>
<!-- ==================== MODULE DESCRIPTION ==================== -->
<h1 class="epydoc">Module new_merge_tools</h1><p class="nomargin-top"><span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html">source&nbsp;code</a></span></p>
<p>A tool for more efficient deduplication when merging 
  bibliographies.</p>
  <p>When merging bibliographies, it is not uncommon to find that multiple 
  entries in the sources represent the same item. These duplicates must be 
  detected so that the duplicate entries may be merged. A naive approach to
  deduplication might involve many repetitive calculations. This module 
  contains the <a href="bkn.bibtex.new_merge_tools.Fingerprint-class.html" 
  class="link">Fingerprint</a> class which implements one strategy for 
  simplifying the process. A Fingerprint object is calculated for each 
  entry in each of the source bibliographies. Dedeuplication then proceeds 
  by comparing fingerprints instead of by comparing the entries 
  directly.</p>
  <p>The Fingerprint class provided here is relatively simplistic. Users 
  are encouraged to create subclasses to meet their needs</p>
  <p>A well implemented Fingerprint subclass will allow quick comparisons 
  between fingerprints. Any expensive or potentially repetitive 
  calculations should be performed during fingerprint creation.</p>

<!-- ==================== CLASSES ==================== -->
<a name="section-Classes"></a>
<table class="summary" border="1" cellpadding="3"
       cellspacing="0" width="100%" bgcolor="white">
<tr bgcolor="#70b0f0" class="table-header">
  <td colspan="2" class="table-header">
    <table border="0" cellpadding="0" cellspacing="0" width="100%">
      <tr valign="top">
        <td align="left"><span class="table-header">Classes</span></td>
        <td align="right" valign="top"
         ><span class="options">[<a href="#section-Classes"
         class="privatelink" onclick="toggle_private();"
         >hide private</a>]</span></td>
      </tr>
    </table>
  </td>
</tr>
<tr>
    <td width="15%" align="right" valign="top" class="summary">
      <span class="summary-type">&nbsp;</span>
    </td><td class="summary">
        <a href="bkn.bibtex.new_merge_tools.Fingerprint-class.html" class="summary-name">Fingerprint</a>
    </td>
  </tr>
</table>
<!-- ==================== FUNCTIONS ==================== -->
<a name="section-Functions"></a>
<table class="summary" border="1" cellpadding="3"
       cellspacing="0" width="100%" bgcolor="white">
<tr bgcolor="#70b0f0" class="table-header">
  <td colspan="2" class="table-header">
    <table border="0" cellpadding="0" cellspacing="0" width="100%">
      <tr valign="top">
        <td align="left"><span class="table-header">Functions</span></td>
        <td align="right" valign="top"
         ><span class="options">[<a href="#section-Functions"
         class="privatelink" onclick="toggle_private();"
         >hide private</a>]</span></td>
      </tr>
    </table>
  </td>
</tr>
<tr>
    <td width="15%" align="right" valign="top" class="summary">
      <span class="summary-type"><code>None</code></span>
    </td><td class="summary">
      <table width="100%" cellpadding="0" cellspacing="0" border="0">
        <tr>
          <td><span class="summary-sig"><a href="bkn.bibtex.new_merge_tools-module.html#suggest_duplicates" class="summary-sig-name">suggest_duplicates</a>(<span class="summary-sig-arg">bib_list</span>,
        <span class="summary-sig-arg">filename</span>,
        <span class="summary-sig-arg">fp_class</span>=<span class="summary-sig-default">Fingerprint</span>,
        <span class="summary-sig-arg">threshold</span>=<span class="summary-sig-default">50</span>)</span><br />
      For each entry in each bib suggests entries of other bibs that may 
      duplicate it.</td>
          <td align="right" valign="top">
            <span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#suggest_duplicates">source&nbsp;code</a></span>
            
          </td>
        </tr>
      </table>
      
    </td>
  </tr>
<tr>
    <td width="15%" align="right" valign="top" class="summary">
      <span class="summary-type">&nbsp;</span>
    </td><td class="summary">
      <table width="100%" cellpadding="0" cellspacing="0" border="0">
        <tr>
          <td><span class="summary-sig"><a href="bkn.bibtex.new_merge_tools-module.html#simple_confirmation_ui" class="summary-sig-name">simple_confirmation_ui</a>(<span class="summary-sig-arg">filename</span>)</span><br />
      A simple confirmation UI using curses.</td>
          <td align="right" valign="top">
            <span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#simple_confirmation_ui">source&nbsp;code</a></span>
            
          </td>
        </tr>
      </table>
      
    </td>
  </tr>
<tr class="private">
    <td width="15%" align="right" valign="top" class="summary">
      <span class="summary-type">&nbsp;</span>
    </td><td class="summary">
      <table width="100%" cellpadding="0" cellspacing="0" border="0">
        <tr>
          <td><span class="summary-sig"><a name="_simple_confirmation_ui"></a><span class="summary-sig-name">_simple_confirmation_ui</span>(<span class="summary-sig-arg">stdscr</span>,
        <span class="summary-sig-arg">filename</span>)</span><br />
      The actual work of simple_confirmation_ui() is done here.</td>
          <td align="right" valign="top">
            <span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#_simple_confirmation_ui">source&nbsp;code</a></span>
            
          </td>
        </tr>
      </table>
      
    </td>
  </tr>
<tr>
    <td width="15%" align="right" valign="top" class="summary">
      <span class="summary-type">&nbsp;</span>
    </td><td class="summary">
      <table width="100%" cellpadding="0" cellspacing="0" border="0">
        <tr>
          <td><span class="summary-sig"><a href="bkn.bibtex.new_merge_tools-module.html#merge_bibs" class="summary-sig-name">merge_bibs</a>(<span class="summary-sig-arg">bibs</span>,
        <span class="summary-sig-arg">suggestion_file</span>)</span><br />
      Merges the listed bibliographies using any accepted same as 
      assertions from the supplied suggestion file.</td>
          <td align="right" valign="top">
            <span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#merge_bibs">source&nbsp;code</a></span>
            
          </td>
        </tr>
      </table>
      
    </td>
  </tr>
<tr>
    <td width="15%" align="right" valign="top" class="summary">
      <span class="summary-type">&nbsp;</span>
    </td><td class="summary">
      <table width="100%" cellpadding="0" cellspacing="0" border="0">
        <tr>
          <td><span class="summary-sig"><a name="merge_bibs_by_id"></a><span class="summary-sig-name">merge_bibs_by_id</span>(<span class="summary-sig-arg">bibs</span>)</span></td>
          <td align="right" valign="top">
            <span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#merge_bibs_by_id">source&nbsp;code</a></span>
            
          </td>
        </tr>
      </table>
      
    </td>
  </tr>
<tr class="private">
    <td width="15%" align="right" valign="top" class="summary">
      <span class="summary-type">&nbsp;</span>
    </td><td class="summary">
      <table width="100%" cellpadding="0" cellspacing="0" border="0">
        <tr>
          <td><span class="summary-sig"><a name="_pseudo_id"></a><span class="summary-sig-name">_pseudo_id</span>(<span class="summary-sig-arg">record</span>)</span><br />
      Returns the tuple (dataset_id, record_id) for the supplied record.</td>
          <td align="right" valign="top">
            <span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#_pseudo_id">source&nbsp;code</a></span>
            
          </td>
        </tr>
      </table>
      
    </td>
  </tr>
<tr class="private">
    <td width="15%" align="right" valign="top" class="summary">
      <span class="summary-type">&nbsp;</span>
    </td><td class="summary">
      <table width="100%" cellpadding="0" cellspacing="0" border="0">
        <tr>
          <td><span class="summary-sig"><a name="_global_id_from_component_ids"></a><span class="summary-sig-name">_global_id_from_component_ids</span>(<span class="summary-sig-arg">dataset_id</span>,
        <span class="summary-sig-arg">record_id</span>)</span><br />
      Returns a global id from the dataset_id and record_id of a record.</td>
          <td align="right" valign="top">
            <span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#_global_id_from_component_ids">source&nbsp;code</a></span>
            
          </td>
        </tr>
      </table>
      
    </td>
  </tr>
</table>
<!-- ==================== VARIABLES ==================== -->
<a name="section-Variables"></a>
<table class="summary" border="1" cellpadding="3"
       cellspacing="0" width="100%" bgcolor="white">
<tr bgcolor="#70b0f0" class="table-header">
  <td colspan="2" class="table-header">
    <table border="0" cellpadding="0" cellspacing="0" width="100%">
      <tr valign="top">
        <td align="left"><span class="table-header">Variables</span></td>
        <td align="right" valign="top"
         ><span class="options">[<a href="#section-Variables"
         class="privatelink" onclick="toggle_private();"
         >hide private</a>]</span></td>
      </tr>
    </table>
  </td>
</tr>
<tr>
    <td width="15%" align="right" valign="top" class="summary">
      <span class="summary-type">&nbsp;</span>
    </td><td class="summary">
        <a name="PREFERRED_ENCODING"></a><span class="summary-name">PREFERRED_ENCODING</span> = <code title="locale.getpreferredencoding()">locale.getpreferredencoding()</code>
    </td>
  </tr>
</table>
<!-- ==================== FUNCTION DETAILS ==================== -->
<a name="section-FunctionDetails"></a>
<table class="details" border="1" cellpadding="3"
       cellspacing="0" width="100%" bgcolor="white">
<tr bgcolor="#70b0f0" class="table-header">
  <td colspan="2" class="table-header">
    <table border="0" cellpadding="0" cellspacing="0" width="100%">
      <tr valign="top">
        <td align="left"><span class="table-header">Function Details</span></td>
        <td align="right" valign="top"
         ><span class="options">[<a href="#section-FunctionDetails"
         class="privatelink" onclick="toggle_private();"
         >hide private</a>]</span></td>
      </tr>
    </table>
  </td>
</tr>
</table>
<a name="suggest_duplicates"></a>
<div>
<table class="details" border="1" cellpadding="3"
       cellspacing="0" width="100%" bgcolor="white">
<tr><td>
  <table width="100%" cellpadding="0" cellspacing="0" border="0">
  <tr valign="top"><td>
  <h3 class="epydoc"><span class="sig"><span class="sig-name">suggest_duplicates</span>(<span class="sig-arg">bib_list</span>,
        <span class="sig-arg">filename</span>,
        <span class="sig-arg">fp_class</span>=<span class="sig-default">Fingerprint</span>,
        <span class="sig-arg">threshold</span>=<span class="sig-default">50</span>)</span>
  </h3>
  </td><td align="right" valign="top"
    ><span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#suggest_duplicates">source&nbsp;code</a></span>&nbsp;
    </td>
  </tr></table>
  
  <p>For each entry in each bib suggests entries of other bibs that may 
  duplicate it.</p>
  <dl class="fields">
    <dt>Parameters:</dt>
    <dd><ul class="nomargin-top">
        <li><strong class="pname"><code>bib_list</code></strong> - a list of <a 
          href="bkn.bibtex.bibliography.Bibliography-class.html" 
          class="link">Bibliographies</a> which may contain duplicate 
          entries. Note that this method assumes that there are no 
          duplicates within any given <code>Bibliography</code>; duplicates
          are assumed to exist between bibliographies only.</li>
        <li><strong class="pname"><code>filename</code></strong> - the filename or path to use when saving suggestions. If a file 
          already exists at the given location it will be opened and any 
          existing sugestions in the file will be used as a starting point.</li>
        <li><strong class="pname"><code>fp_class</code></strong> (<a href="bkn.bibtex.new_merge_tools.Fingerprint-class.html" 
          class="link">Fingerprint</a>) - a <code>Fingerprint</code> class which will be used for comparing
          entires. Defaults to the base <code>Fingerprint</code> class.</li>
        <li><strong class="pname"><code>threshold</code></strong> - a number between 0 and 100. This method will suggest that two 
          records are the same if their fingerprints compare with a 
          confidence greater than this value.</li>
    </ul></dd>
    <dt>Returns: <code>None</code></dt>
  </dl>
</td></tr></table>
</div>
<a name="simple_confirmation_ui"></a>
<div>
<table class="details" border="1" cellpadding="3"
       cellspacing="0" width="100%" bgcolor="white">
<tr><td>
  <table width="100%" cellpadding="0" cellspacing="0" border="0">
  <tr valign="top"><td>
  <h3 class="epydoc"><span class="sig"><span class="sig-name">simple_confirmation_ui</span>(<span class="sig-arg">filename</span>)</span>
  </h3>
  </td><td align="right" valign="top"
    ><span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#simple_confirmation_ui">source&nbsp;code</a></span>&nbsp;
    </td>
  </tr></table>
  
  <p>A simple confirmation UI using curses.</p>
  <dl class="fields">
  </dl>
<div class="fields">      <p><strong>To Do:</strong>
        consider extracting this behavior into a new ConfirmationUI class.
      </p>
</div></td></tr></table>
</div>
<a name="merge_bibs"></a>
<div>
<table class="details" border="1" cellpadding="3"
       cellspacing="0" width="100%" bgcolor="white">
<tr><td>
  <table width="100%" cellpadding="0" cellspacing="0" border="0">
  <tr valign="top"><td>
  <h3 class="epydoc"><span class="sig"><span class="sig-name">merge_bibs</span>(<span class="sig-arg">bibs</span>,
        <span class="sig-arg">suggestion_file</span>)</span>
  </h3>
  </td><td align="right" valign="top"
    ><span class="codelink"><a href="bkn.bibtex.new_merge_tools-pysrc.html#merge_bibs">source&nbsp;code</a></span>&nbsp;
    </td>
  </tr></table>
  
  <p>Merges the listed bibliographies using any accepted same as assertions
  from the supplied suggestion file.</p>
  <dl class="fields">
  </dl>
<div class="fields">      <strong>To Do:</strong>
      <ul class="nomargin-top">
        <li>
        This method needs to be revised so as to assign ids in a consistent 
    manner, i.e. the same id should be assigned to the same record on 
    multiple passes.
        </li>
        <li>
        This method currently fails since it may create a bib with duplicate 
    ids during intermediate stages. This must be fixed.
        </li>
      </ul>
</div></td></tr></table>
</div>
<br />
<!-- ==================== NAVIGATION BAR ==================== -->
<table class="navbar" border="0" width="100%" cellpadding="0"
       bgcolor="#a0c0ff" cellspacing="0">
  <tr valign="middle">
  <!-- Home link -->
      <th>&nbsp;&nbsp;&nbsp;<a
        href="bkn.bibtex-module.html">Home</a>&nbsp;&nbsp;&nbsp;</th>

  <!-- Tree link -->
      <th>&nbsp;&nbsp;&nbsp;<a
        href="module-tree.html">Trees</a>&nbsp;&nbsp;&nbsp;</th>

  <!-- Index link -->
      <th>&nbsp;&nbsp;&nbsp;<a
        href="identifier-index.html">Indices</a>&nbsp;&nbsp;&nbsp;</th>

  <!-- Help link -->
      <th>&nbsp;&nbsp;&nbsp;<a
        href="help.html">Help</a>&nbsp;&nbsp;&nbsp;</th>

  <!-- Project homepage -->
      <th class="navbar" align="right" width="100%">
        <table border="0" cellpadding="0" cellspacing="0">
          <tr><th class="navbar" align="center"
            ><a class="navbar" target="_top" href="http://code.google.com/p/bibkn/">bkn.bibtex</a></th>
          </tr></table></th>
  </tr>
</table>
<table border="0" cellpadding="0" cellspacing="0" width="100%%">
  <tr>
    <td align="left" class="footer">
    Generated by Epydoc 3.0.1 on Sun Jul  4 23:10:58 2010
    </td>
    <td align="right" class="footer">
      <a target="mainFrame" href="http://epydoc.sourceforge.net"
        >http://epydoc.sourceforge.net</a>
    </td>
  </tr>
</table>

<script type="text/javascript">
  <!--
  // Private objects are initially displayed (because if
  // javascript is turned off then we want them to be
  // visible); but by default, we want to hide them.  So hide
  // them unless we have a cookie that says to show them.
  checkCookie();
  // -->
</script>
</body>
</html>
